Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Binary Neural Architecture Search

binary architecture search can be improved by the auxiliary binary architecture search [24].

BATS [24] designs a new search space specially tailored for the binary network and incor-

porates it into the DARTS framework.

Unlike the aforementioned methods, our work is driven by the performance discrepancy

between the 1-bit neural architecture and its real-valued counterpart. We introduce tangent

propagation to explore the accuracy discrepancy and further accelerate the search process

by applying the GGN to the Hessian matrix in optimization. Furthermore, we introduce a

novel decoupled optimization to address asynchronous convergence in such a diﬀerentiable

NAS process, leading to better performed 1-bit CNNs. The overall framework leads to a

novel and eﬀective BNAS process.

To introduce the advances of the NAS area, we separately introduce the representative

works in the NAS and binary NAS in the following.

4.2

ABanditNAS: Anti-Bandit for Neural Architecture Search

Low search eﬃciency has prevented NAS from its practical use, and the introduction of

adversarial optimization and a larger search space further exacerbates the issue. Early work

directly regards network architecture search as a black-box optimization problem in a dis-

crete search space and takes thousands of GPU days. To reduce the search space, a common

idea is to adopt a cell-based search space [307]. However, when it comes to searching in a

huge and complicated search space, prior cell-based works may still suﬀer from memory is-

sues and are computationally intensive with the number of meta-architecture. For example,

DARTS [151] can only optimize over a small subset of 8 cells, which are then stacked to

form a deep network of 20. We reformulate NAS as a multi-armed bandit problem with a

vast search space to increase search eﬃciency. The multi-armed bandit algorithm targets

predicting the best arm in a sequence of trials to balance the result and its uncertainty.

Likewise, NAS aims to get the best operation from an operation pool at each edge of the

model with ﬁnite optimization steps, similar to the multi-armed bandit algorithm. They

are both exploration and exploitation problems. Therefore, we tried to introduce the multi-

armed bandit algorithm into NAS. In addition, the multi-armed bandit algorithm avoids

the gradient descent process and provides good search speed for NAS. Unlike traditional

Upper Conﬁdence Bound (UCB) bandit algorithms that prefer to sample using UCB and

focus on exploration, we propose Anti-Bandit to further exploit both UCB and Lower Con-

ﬁdence Bound (LCB) to balance exploration and exploitation. We achieve an accuracy-bias

trade-oﬀduring the search process for the operation performance estimation. Using the test

performance to identify the optimal architecture quickly is desirable. With the help of the

Anti-Bandit algorithm, our Anti-Bandit NAS (ABanditNAS) [34] can handle the vast and

complicated search space, where the number of operations that deﬁne the space can be 9⁶⁰!

Speciﬁcally, our proposed Anti-Bandit algorithm uses UCB to reduce search space, and

LCB guarantees that every arm is thoroughly tested before abandoning it, as shown in

Figure 4.1. Based on the observation that the early optimal operation is not necessarily

the optimal one in the end, and the worst operations in the early stage usually have worse

performance in the end [291], we pruned the operations with the worst UCB, after enough

trials selected by the worst LCB. This means that the operations we ﬁnally reserve are

certainly a near-optimal solution. The more tests that are conducted, the closer UCB and

LCB are to the average value. Therefore, LCB tends to increase and UCB decreases with

increasing sampling times. Speciﬁcally, operations with poor performance in the early stages,

such as parameterized operations, will receive more opportunities but are abandoned once